Skip to content

Ship static busybox shell in gpu-operator image#2434

Draft
rajathagasthya wants to merge 1 commit into
NVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev
Draft

Ship static busybox shell in gpu-operator image#2434
rajathagasthya wants to merge 1 commit into
NVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev

Conversation

@rajathagasthya
Copy link
Copy Markdown
Contributor

@rajathagasthya rajathagasthya commented May 6, 2026

Flip the base from *-dev* to non-*-dev* distroless and source a static busybox from debian:trixie-slim. Init container wrappers, lifecycle hooks, and helper scripts continue to work via /bin/sh and busybox applet symlinks layered into the final image.

Part of NVIDIA/cloud-native-team#299.
Resolves #2435.

@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch 5 times, most recently from 19fd65d to 14f5202 Compare May 6, 2026 20:25
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 14f5202 to 20e9691 Compare May 7, 2026 15:50
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 20e9691 to 9e3efb2 Compare May 19, 2026 18:05
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 9e3efb2 to 448be34 Compare May 19, 2026 18:36
rajathagasthya added a commit to rajathagasthya/mig-parted that referenced this pull request May 19, 2026
The gpu-operator mounts a ConfigMap-backed `entrypoint.sh` into the
nvidia-mig-manager container today: it waits for the driver-ready
file, sources it as KEY=value env, derives
`WITH_SHUTDOWN_HOST_GPU_CLIENTS=$IS_HOST_DRIVER`, and execs
`nvidia-mig-manager`. That script requires a shell in the container
image, which is currently provided by the `-dev` distroless variant
via a busybox `/bin/sh` symlink. NVIDIA STIG policy is dropping
`-dev` distroless as approved parent images, so the shell has to
go — and that means the entrypoint logic has to live in the binary.

Move startup hooks into `nvidia-mig-manager` itself. A new
`internal/startup` package provides `WaitForFile` (polls
`os.Stat`) and `SourceEnvFile` (parses `KEY=value` lines with quote
and comment handling, calls `os.Setenv`). `main()` runs the hooks
before `cli.App.Run` parses flags, so any env vars sourced from
`driver-ready` are visible to the `EnvVars:` declarations on each
cli.Flag. The hooks are opt-in via env vars:

- `WAIT_FOR_DRIVER_READY=<path>` — block on the file's existence
- `DRIVER_ENV_FILE=<path>` — source KEY=value into the process env
- `WAIT_FOR_FILE_INTERVAL=<duration>` — poll interval, default 5s

After sourcing, `IS_HOST_DRIVER` is mirrored into
`WITH_SHUTDOWN_HOST_GPU_CLIENTS` for backward compatibility with
the existing shell behavior. The cli flag picks up the env var as
usual.

Drop the `SHELL ["/busybox/sh", "-c"]` directive and the
`RUN ln -s /busybox/sh /bin/sh && rm -r /var/run && ln -s /run
/var/run` step from the Dockerfile, and flip the base from
`distroless/go:v4.0.5-dev` to `v4.0.5`. The `/var/run` -> `/run`
symlink is provided by the non-`-dev` distroless base.

Companion to NVIDIA/gpu-operator#2434, which removes the
`nvidia-mig-manager-entrypoint` ConfigMap and updates the
state-mig-manager DaemonSet to invoke `nvidia-mig-manager` directly
with the new env vars set on the container spec.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch 2 times, most recently from acd7fa9 to 8d75aec Compare May 20, 2026 03:06
@rajathagasthya rajathagasthya changed the title Remove shell dependency from validator pods Ship static busybox shell in gpu-operator image May 20, 2026
Flip the base from *-dev* to non-*-dev* distroless and source a static
busybox from debian:trixie-slim. Init container wrappers, lifecycle
hooks, and helper scripts continue to work via /bin/sh and busybox
applet symlinks layered into the final image.

Part of NVIDIA/cloud-native-team#299.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 8d75aec to f6ed616 Compare May 20, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ship static busybox shell in gpu-operator image

1 participant